Construction of a geographically weighted nonparametric regression model fit test

One of the approach of Geographically Weighted Regression (GWR) models is the Geographically Weighted Nonparametric Regression (GWNR) has more parameters than the GWR model. Models with more parameters usually have better match values, which is an advantage, while models with fewer parameters have the advantage of being easier to use and interpret. However, a model with more parameters should be used if it is proven to be significantly superior. Therefore, the purpose of this study was to develop a hypothesis test of goodness of fit test for GWNR model. The goodness of fit test was performed for the real data. We found that the GWNR model was more suitable than the mixed nonparametric regression model. Some highlights of the proposed method are:• A new model for GWR to overcome the unknown regression function by using mixed estimator spline truncated and fourier series at nonparametric regression• Goodness of fit for GWNR to testing the model fit between the mixed nonparametric regression model and GWNR• Applied goodness of fit test to poverty data in Sulawesi Island and infant mortality in East Java


Introduction
One popular spatial data analysis technique is the Geographically Weighted Regression (GWR) method, which has a wide range of applications.However, many researchers have further developed this model.As an illustration, [1] developed GWR on the negative binomial distribution, [2] developed GWR with GWR Multiscale, [3][4][5] discussed GWR semiparametric model, [6] built GWPolR (Geographically Weighted Polynomial Regression) model fit test, but GWR developed previously, still limited to parameterized regression.Based on Eubank [7] , said that nonparametric regression is more flexible than parametric regression because nonparametric regression does not make assumptions about the shape of the function that connects the independent and dependent variables, but adjusts to the actual data.Research on nonparametric regression both single estimator [8][9][10] and mixed estimator [11][12][13] has been widely done.Furthermore, nonparametric regression on spatial data will be developed including: [14][15][16] studied a new method on GWR Spline Truncated model, and [17] developed GWR on spatial data.
Several references, such as [18] and [19] , have shown that the GWR model can be expressed as follows: (   ,   )   +   ,  = 1 , 2 , ..., ,  = 1 , 2 , ...,  (1) where (  ,  1  , ...,   ) is the observed value of the response variable y and the predictor variables  1 ,  2 , …, and   at location (  ,   ) ,   (  ,   ) for k = 1,2,…,K is the parameter or regression coefficient at location (  ,   ) and   ,  = 1 , 2 , ...,  , is the model error assumed to follow the Normal distribution with a mean 0 and variance is usually expressed by   ∼ (0 ,  2 ) .The GWR is a modelling technique with regression coefficients or parameters that vary by location.However, when looking at the model in each location, all predictor variables are related to the response using a linear function.In fact, not all predictor variables have a linear relationship with the response.The most likely cause is that some of the predictor variables involved in the model have a non-linear pattern or do not have a certain pattern.To overcome this issue, [17] have made an extension or developed a model using a mixed nonparametric regression approach.In [17] , the model is called Geographically Weighted Nonparametric Regression (GWNR).
The GWNR model is a development of the GWR model using a nonparametric regression function.The purpose of using this nonparametric regression function is to better accommodate the behavior of samples which may have one or more predictor variables that have an unknown relationship with the response [20][21][22] .Consequently, the GWNR model has more parameters than the GWR model.Generally, the advantage of a model with more parameters is that it has a higher fit value, while models with fewer parameters have the advantage of ease of application and interpretation.However, if the model with more parameters is found to be significantly better than that with fewer parameters, it should be used.With regards to the above mentioned background, it is necessary to discuss the significance test of GWNR model fit, and thus the purpose of this paper is to construct a hypothesis test of GWNR model fit.

Geographically Weighted Nonparametric Regression Model
According to [17] , the GWNR model is an extension of the GWR model in Eq. ( 1) which can be expressed as Based on Eq. ( 4) , the estimated value vector of GWNR model for response variable y in n locations can be expressed as with It is called the hat matrix of the GWNR model.Based on Eq. ( 5) , the residual vector can be written as Furthermore, with Eq. ( 6) the Sum of Squared Error (SSE) of the GWNR model is where I is the nth-order identity matrix.

GWNR Model Fit Test
In this section, GWNR model fit test statistics based on the SSE model will be constructed.The approximation distribution of the test statistic will then be investigated to test whether the GWNR model can describe the sample data significantly better than the GWR model.In this discussion, two assumptions are given as follows: Assumption 1. the model error is  1 ,  2 , …,   is assumed to be Normally distributed with a mean 0 and variance  2 Assumption 2. Suppose that ŷ is the estimated value of  at location of (  ,   ) .For ŷ is unbiased estimator from ( ) , that is
Suppose the sum of squared errors of GWNR model is denoted by      then Furthermore, the unbiased estimator for the error variance is given in Theorem 1 below.
Theorem 1. Suppose the GWNR model satisfies Assumption 1 and Assumption 2 , and η(  ,   ) as the parameter estimator at location (  ,   ) , then the unbiased estimator for the error variance ( 2 ) is given as follows: with Proof.Based on Assumption 1 and Assumption 2 , then where  = ( 1  2 ...   )  is the model error vector.Thus it can be expressed as follows: where  1 =  (( −  )  ( −  ) ) .Based on Eq. ( 11) , the unbiased estimator for  2 is σ2 =       1 . Operationally  1 can be expressed with the following formula Based on Theorem 1 , the magnitude      can be used to estimate the error variance  2 .This magnitude can be used to measure the suitability of the GWNR model for the sample data.The smaller the value of this quantity, the more appropriate the model is applied to the sample data.However, for the purpose of model fit testing, knowledge of the distribution of the magnitude requires      .Therefore, a distribution approach is given for      which can be seen in Theorem 2 below.with

Theorem 2. Suppose the GWNR model meets
Proof.From Eq. ( 18) it can be seen that      can be expressed as the square form of the Normal variable with ( −  )  ( −  ) is a symmetry matrix and positive semidefinite.Based on the distribution theory of the quadratic form (1), it is known that a quadratic form of the Standard Normal variable, namely     with  ∼ (0 ,  ) and A symmetry matrix, is distributed  2 if and only if A is an idempotent matrix [7] .For variable it is known that   ∼ (0 ,  ) , but the matrix ( −  )  ( −  ) is generally not idempotent, because its forming element is the weight matrix  (  ,   ) which is different at each location i.As a result, is not exact  2 distribution, but the distribution can be determined with the approach  2 .Based on the quadratic shape distribution theory [14] , the distribution approach to the quadratic form (18) can be done by multiplying the variables  2   by a constant c, which can be written by  2  .Furthermore, c and r are determined so that the mean and variance of  2   , the approximated quadratic  2  form correspond to each other.For the variable  2   , based on the theory, the mean and variance are r and 2r, respectively.Therefore, the mean and variance of the variable are cr and 2 c 2 r , respectively.
For quadratic form variables , based on Eq. ( 11) is known to have a mean  1 .The variance is described below.Since the matrix ( −  )  ( −  ) is symmetry and positive semidefinite, there exists an orthogonal matrix P of order n such that where  is a diagonal matrix whose main diagonal elements are the eigenvalues of the matrix ( −  )  ( −  ) , suppose that According to the nature of multivariate Normal distribution,  1 ,  2 , ...,   is an independent and identical random variable with standard Normal distribution, which is written by   ∼  (0 , 1 ) .On the other hand, from Eq. ( 17) , it is obtained   = , and the line from Eq. ( 15) that gets In accordance with the distribution theory, because   ∼  (0 , 1 ) then where Based on Eqs. ( 11) and ( 20) , the following system of equations can be formed The solution of the system of Eqs. ( 21) is  = . Thus, the distribution of where

Test Statistics of GWNR Model Fit and Distribution
The Sum of Squared Error and its approximation distribution that have been obtained in the Theorem 2 will then be used to test the model fit, to determine whether GWNR can model the data significantly better than MNR.In this model fit test, it is hypothesized as follows: Approximates the F distribution with numerator degrees of freedom  1 = , for If the MNR model is used to model the sample data, then the sum of squared errors can be expressed by       =   ( −  )  ( −  )  where L is the hat matrix of the MNR model.According to [19] , the distribution of the variable can be approximated by the distribution  2  with degrees of freedom  = In the distribution theory, the variable is F -distributed with independent degrees of the numerator r and independent degrees of the denominator d.Therefore, the distribution of the test statistic Approximates the F distribution with independent degrees of numerator  1 and independent degrees of denominator  1 .If simplified, Eq. ( 26 Based on this logic, the larger the  * 1 value, the more supportive it is to reject  0 , which means that the GWNR model is significantly more suitable than the MNR model.Therefore, at a given significance  level,  0 is rejected if  * 1 >  (,  1 ,  2 ) and it is concluded that the GWNR model is significantly better fit than the MNR model.

Application of data
Two cases were used in this research: the infant mortality case in East Java, which affected 38 city districts, and the poverty case on Sulawesi Island, which affected 81 city districts.The following is explained using the steps from the research.
The scatter plot for the four predictor variables X 1 , X 2 , X 3 , and X 4 against the response variable (Y) can be seen in Fig. 1 .Visually, it can be seen that the relationship between the predictor variables and the response variable shows an unknown relationship pattern.One alternative that can be used is to model the data with a nonparametric regression model.In this study, we will use a mixed estimator in the GWNR model, namely a mixed estimator of truncated spline and fourier series.Furthermore, MNR and GWNR models are given as in Eqs. ( 27) , (28) , and (29) The Mixed Nonparametric Regression Model for first case is: The GWNR model for some locations to first case are: For Kendari City:  Table 2 illustrates that, when analyzing the number of infant deaths data, the GWNR model featuring a mixed spline truncated and fourier series estimator performs better in terms of MSE and R 2 values.

Conclusion
The GWNR model fit test is constructed to determine whether the GWNR model describes a particular data set significantly better than MNR.This test can be constructed based on the sum of squared errors of the model.The GWNR model fit test statistic can be approximated using the F distribution.The use of the GWNR model fit test to examine the poor population data in Sulawesi and number of infant deaths shows that the GWNR model is better than the mixed nonparametric regression model in its application.

Theorem 3 .
Suppose       =   ( −  )  ( −  )  is the sum of squared errors of MNR model where L is the hat matrix of MNR model, and      =   ( −  )  ( −  )  is the sum of squared errors of GWNR model where G is the hat matrix of GWNR model.Distribution of statistics

LFig. 1 .
Fig. 1.Scatter Plot between Response Variables and Predictor Variables for First Case.

Table 2
Best Model Selection for the second cases.